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Future space transportation architectures and designs must be affordable. Consequently, their 
Life Cycle Cost (LCC) must be controlled. For the LCC to be controlled, it is necessary to identify all 
the requirements and elements of the architecture at the beginning of the concept phase. Controlling 
LCC requires the establishment of the major operational cost drivers. Two of these major cost 
drivers are reliability and maintainability, in other words, the system’s availability (responsiveness). 
Potential reasons that may drive the inherent availability requirement are the need to control the 
number of unique parts and the spare parts required to support the transportation system’s 
operation. For more typical space transportation systems used to place satellites in space, the 
productivity of the system will drive the launch cost. This system productivity is the resultant output 
of the system availability. 

Availability is equal to the mean uptime divided by the sum of the mean uptime plus the mean 
downtime. Since many operational factors cannot be projected early in the definition phase, the focus 
will be on inherent availability which is equal to the mean time between a failure (MTBF) divided by 
the MTBF plus the mean time to repair (MTTR) the system. The MTBF is a function of reliability or 
the expected frequency of failures. When the system experiences failures the result is added 
operational flow time, parts consumption, and increased labor with an impact to responsiveness 
resulting in increased LCC. The other function of availability is the MTTR, or maintainability. In 
other words, how accessible is the failed hardware that requires replacement and what operational 
functions are required before and after change-out to make the system operable. This paper will 
describe how the MTTR can be equated to additional labor, additional operational flow time, and 
additional structural access capability, all of which drive up the LCC. 

A methodology will be presented that provides the decision makers with the understanding 
necessary to place constraints on the design definition. This methodology for the major drivers will 
determine the inherent availability, safety, reliability, maintainability, and the life cycle cost of the 
fielded system. This methodology will focus on the achievement of an affordable, responsive space 
transportation system. 

It is the intent of this paper to not only provide the visibility of the relationships of these major 
attribute drivers (variables) to each other and the resultant system inherent availability, but also to 
provide the capability to bound the variables, thus providing the insight required to control the 
system’s engineering solution. An example of this visibility is the need to provide integration of 
similar discipline functions to allow control of the total parts count of the space transportation 
system. Also, selecting a reliability requirement will place a constraint on parts count to achieve a 
given inherent availability requirement, or require accepting a larger parts count with the resulting 
higher individual part reliability requirements. This paper will provide an understanding of the 
relationship of mean repair time (mean downtime) to maintainability (accessibility for repair), and 
both mean time between failure (reliability of hardware) and the system inherent availability. 
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Nomenclature 


Ai 

= 

inherent availability 

A a 

= . 

achieved availability 

Ac 

= 

operational availability 

DDT&E 

= 

design, development, test & evaluation 

IFA 

= 

in-flight anomaly 

LCC 

= 

life cycle cost 

LRU 

= 

line replaceable unit 

MRB 

= 

materials review board 

MTBF 

= 

mean time between failure 

MTTR 

= 

mean time to repair 

X 

= 

failure rate or the reciprocal of the MTBF 

r 

= 

number of failures or repairs 

N 

= 

total parts count 

t 

= 

system exposure time 

PR 

= 

problem report 

Pr 

= 

probability 

SPST 

= 

space propulsion synergy team 

TPM 

= 

technical performance metric 


I. Introduction 

I T is essential that management and engineering understand the need for a derived availability 
requirement for the customer’s space transportation system that is linked to life cycle cost (LCC). It is 
also essential to provide engineering and management the visibility of the several variables that determine 
the availability values required to enable key goals and objectives such as controlling LCC shown in Figure 
1. This relationship of the variables driving the availability needs must be understood by all decision 
makers involved. This paper will address the inherent availability which addresses the mean downtime as 
that mean time to repair (the time to determine the failed article, remove it, install a replacement article and 
verify the functionality of the repaired system). Also, with inherent availability the mean uptime will only 
consider the mean time between failures (for example, another form of availability addresses mean time 
between maintenance that includes both preventive and corrective maintenance) that require the repair of 
the system to be functional. It is also essential that management and engineering understand all influencing 
attribute relationships to each other and to the resultant inherent availability requirement. Figure 1 provides 
a visual influence diagram of these attribute relationships to each other and to the resultant availability 
requirement. This visibility will provide the decision makers with the understanding necessary to place 
constraints on the design definition for the major drivers that will determine the inherent availability, 
safety, reliability, maintainability, and the life cycle cost of the fielded system provided to the customer. 
This inherent availability requirement must be driven by the need to control the number of unique 
parts/subsystems and the spare parts required to support the transportation system operation. 



Figure 1. Availability influence diagram. 
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- - n. Design to Life Cycle Cost Management 

The design to life cycle cost requires a rigorous process. The foundation must be implemented and 
demonstrated during the early part of the vehicle design program and refined during its duration. It is a 
process where trades-offs among development, operational performance, schedule, risk, DDT&E costs, and 
LCC must be addressed on a continuing basis. An ability to control LCC within stringent total program and 
fiscal constraints must be managed from the start of the design definition phase and must be carried through 
the operational life of the vehicles developed in the program. Key features of the design to Life Cycle Cost 
Management approach include: 

1. Cost credibility through the use of extensive cost databases to develop , initial values and operation 
cost models to assure the credibility of initial, early estimates. 

2. The ability to assess annual funding constraints while exploring alternative system concepts. 

3. A Design to LCC Management manager reporting directly to the program manager, thereby 
providing a high level, single point of contract. 

4. A Design to LCC Management process which is an integral part of performance management system, 
thereby assuring an integrated cost management system which is coupled with the technical performance 
measurement system to enhance the early detection of unfavorable trends. 

5. Cost effective design solutions through System Engineering control of the Technical performance 
and Operation cost assessments and constraints. 

6. Early establishment of a realistic budget of cost driving functions and cost objectives and an 
emplaced highly visible management processes and accompanied discipline to control and to achieve them. 

HI. Background for Availability Discussion 

Availability is the probability that a repairable system is operational — thus, availability is a function of 
both reliability and maintainability. Reliability is the probability a system will perform its intended function 
without failure for a specified period of time under specified conditions. Maintainability is the probability 
of restoring or repairing a system within a period of time when maintenance is performed in accordance 
with prescribed procedures. 

Availability and not reliability addresses downtime (i.e., time for maintenance, repair, and replacement 
activities). As with reliability, availability can be either a demonstrated or predictive measure of 
performance. Demonstrated availability is simply (uptime) / (uptime + downtime). Predictive availability 
has three types, namely, at time t (point availability), over an interval from ti to t 2 (interval availability), or 
over the long run as t — ► oo (steady-state availability). 

Steady-state availability has three common forms (with each depending on the definitions of uptime and 
downtime), namely, inherent availability (Ai), achieved availability (Aa), and operational availability (Ao). 
Inherent availability is based solely on the failure (reliability) distribution and the downtime distribution 
(maintainability) and is an important system parameter for concept architectural design definition through 
systems trade studies. 

The maintainability parameter of inherent availability only accounts for the time to diagnose and locate 
the failed article, access and repair it and verify the functionality of the repaired system. Whereas the 
maintainability parameter for achieved availability is the same as inherent availability except it includes the 
time for preventive maintenance. Last, the maintainability parameter for operational availability is the same 
as achieved availability except it includes the time for logistics and administrative delays. 

For the purpose of this paper we will only discuss inherent availability (A*) as shown in Eq. 1 , 

A = MTBF / (MTBF + MTTR) ( 1 ) 

where MTBF is the mean time between failure and MTTR is the mean time to repair. That is, MTBF is the 
average time between system failures (i.e., the average time the system performs its intended function), and 
MTTR is the average down time required to identify and access the failed article, repair or replace the 
article, and verify the functionality of the repaired system. Stating an availability requirement by itself will 
not accomplish the requirement’s intent. Why, because there are three major drivers that influence and 
enable the achievement of the availability requirement. These drivers are reliability, maintainability, and 
total parts count. The availability requirement and the mentioned drivers must be developed and linked 
together to form interdependent requirements. The relationship of these drivers and the desired level of 
inherent availability must be understood by both engineering and management to systematically achieve the 
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customer’s needs and goals. Also the understanding of the overall productivity of the transportation system 
is linked to these same three major drivers as well as the support labor and material for the system’s 
operation. 


IV. Discussion: Understanding System Availability, its Drivers, and Relationship to Cost 

A. Inherent Availability and its Influencing Attributes 

We will address inherent availability from a design perspective. By emphasizing the importance of the 
key attributes that influence availability, we can control or minimize the need to perform unplanned work 
during space operations and only be addressed during planned downtime for both modifications and depot 
type maintenance. Since inherent availability is a mathematical function of MTBF and MTTR, availability 
is determined by both parameters (drivers) and not one. Thus, reliability and its common metric (MTBF) do 
not equate to availability. As MTBF increases, upper-bound MTTR increases for lower-bound availability 
requirement. Therefore, if the transportation operation cannot accommodate the amount of down time from 
the predicted MTTR requirement, there is a need for selecting a higher availability requirement. If the 
opposite approach is taken to reduce MTBF in order to reduce the allowable MTTR, the probability of the 
number of failures would increase resulting in more replacement parts and the same total down time. 
However, the impact to the operation/mission will be much greater. That is, there would be a greater 
burden on logistics and higher life cycle cost due to the increased demand in providing more parts. Table 1 
below illustrates this relationship between the requirements for MTTR and MTBF for different availability 
requirements. This table assumes there is one system element with an operation/mission time of one hour 
and with failures occurring at a constant rate. 

Table 1. Availability requirement as a function of the reliability requirement and maintainability 

requirement for a fixed mission time. 


Availability (A) 


System 

90% 

94% 

98% 

99% 

99.50% 

99.90% 

Reliability 


0.9500 

2.17 

1.24 

0.40 

0.20 

0.10 

0.02 

0.9800 

5.50 

3.16 

1.01 

0.50 

0.25 

0.05 

0.9900 

11.06 

6.35 

2.03 

1.01 

0.50 

0.10 

0.9940 

18.46 

10.61 

3.39 

1.68 

0.84 

0.17 

0.9950 

22.17 

12.73 

4.07 

2.02 

1.00 

0.20 

0.9960 

27.72 

15.93 

5.09 

2.52 

1.25 

0.25 

0.9980 

55.50 

31.88 

10.19 

5.05 

2.51 

0.50 

0.9990 

111.06 

63.80 

20.40 

10.10 

5.02 

1.00 

0.9998 

555.50 

319.12 

102.03 

50.50 

25.12 

5.00 

0.9999 

fill 1.06 

638.27 

204.07 

101.01 

50.25 

10.01 


MTTR (Hours) 


MTBF = 
-1/In R 
19.496 

49.498 

99.499 
166.166 

199.500 

249.500 

499.500 

999.500 

4999.500 

9999.500 


To understand the relationship between increased hardware failures and reduced reliability, we will 
examine the probability of failure, total parts count, and system reliability. The Poisson distribution can be 
used to predict the exact number of repair or failure events (r) in time period (t) of interest. However, it 
assumes a part has a constant repair or failure rate X (where X is the reciprocal of MTBF), and is 
immediately repaired or replaced. When the forecast is to determine the likelihood of r or less number of 
failures, the cumulative Poisson distribution can be is used to determine this probability (Pr) and is 
described in Eq 2 


/(»')] (2) 

where r is the upper bound for the number of failures, N is the total parts count under consideration, X is the 
failure rate, and t is time period of interest. Using Eq. 2 and Table 2 illustrates the relationship between 
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system-complexity (parts count) and system reliability where Table 2 provides the visibility for the 
predicted probability of success of controlling the part failures during the period of time of interestrThis 
methodology can be used during design for controlling predicted hardware failures. This methodology 
places bounds on parts count to system reliability being selected. 


Table 2. System Complexity (parts count) shown as a function system reliability and probability 
of 1 or less failures (events) per one hour time period (mission). 

IF: Proposed System Has Serial Element Count (N) - 2,000 

Mission Time (t) = 1 


Mission's Maximum Failure Count (r) = 1 


And: 

Then: 

Probability Of Success: Failure Count Is r Or Less During t For Various 
System Complexity Levels (N^) Based On ^ 

System 

Reliability 

(R) 

System 

MTBF 

Element 
Failure Rate 
(>j) 

1,000 

1,500 

2,000 

2,500 

5,000 

10,000 

20,000 

0.940 

16.2 

3.0938E-05 

0.99953 

0.99896 

0.99816 

0.99716 

0.98920 

0.96096 

6.87189 

0.945 

17.7 

2.8285E-05 

0.99961 

0.99913 

0.99846 

0.99761 

0.99089 

0.96680 

0.88926 

0.950 

19.5 

2.5647E-05 

0.99968 

0.99928 

0.99873 

0.99803 

0.99245 

0.97223 

0.90585 

0.955 

21.7 

2.3022E-05 

0.99974 

0.99942 

0.99897 

0.99841 

0.99386 

0.97724 

0.92155 

0.960 

24.5 

2.041 IE-05 

0.99979 

0.99954 

0.99919 

0.99874 

0.99513 

0.98180 

0.93623 

0.965 

28.1 

1.7814E-05 

0.99984 

0.99965 

0.99938 

0.99904 

0.99626 

0.98590 

0.94977 

0.970 

32.8 

1.5230E-05 

0.99989 

0.99974 

0.99955 

0.99929 

0.99724 

0.98952 

0.96204 

0.975 

39.5 

1.2659E-05 

0.99992 

0.99982 

0.99968 

0.99951 

0.99808 

0.99263 

0.97288 

0.980 

49.5 

1.0101E-05 

0.99995 

0.99989 

0.99980 

0.99969 

0.99877 

0.99523 

0.98214 

0.985 

66.2 

7.5568E-06 

0.99997 

0.99994 

0.99989 

0.99982 

0.99930 

0.99728 

0.98967 

0.990 

99.5 

5.0252E-06 

0.99999 

0.99997 

0.99995 

0.99992 

0.99969 

0.99878 

0.99528 


When evaluating total parts count, this can be considered in two different ways. If the concern is for 
affordability (LCC), the total parts count considers all components that could be considered to have a 
failure mode. Any part failure will result in added maintenance burden and result in added life cycle cost. 
However, if the concern is for achieving a successful launch on time or for the in-space application for long 
term space flight, only the critical components (parts) should be considered that would impact the 
successful mission accomplishment. Because of this difference in objectives, the designer will probably 
want to perform both evaluations to allow the achievement of both objectives which can be controlled and 
accomplished by the design process. These attribute relationships and availability can be made more visible 
by examining scenario examples. 


B. An example of Space Transportation Application 

Let’s work an example case through this process to allow better visibility of using these aids. Let’s 
assume for a repairable system the requirements are a 45 day period (1080 hours) with 0.98 system 
reliability, 98% system availability, and upper-bound MTTR at 216 hours. This 45-day target may 
represent a desired total time for receiving the hardware at the launch site, integrating the major elements, 
servicing the consumables, installing and connecting any ordinance, and launching the space transportation 
system into space (including approximately 20% for hardware replacement, e.g., MTTR) . We can see from 
Table 3 that the upper bound MTTR for our example is 1090.98 hours. However, we must either select a 
higher availability or lower system reliability since the calculated upper-bound MTTR greatly exceeds the 
216-hour requirement. Again using Table 3 when we do not change the 0.98 system reliability requirement, 
the availability requirement needs to be adjusted upwards to be ~ 99.9% providing an upper-bound MTTR 
of 53.51 hours. The other option would be to reduce system reliability to 0.90 to retain the upper-bound 
MTTR requirement of 216 hours. However, when we select a lower reliability, we need to address the 
likelihood (probability) of experiencing additional hardware failures. It can be seen from Table 4 that the 
system complexity requirement would be constrained to - 10,765 critical parts count maximum at a 98% or 
better probability of success while predicting the failures to be 2 or less parts per event. However, the 
upper-bound MTTR for these 2 parts will only be ~ 209 hours to achieve the availability of 98%. This 
option can be compared to the reliability choice of 0.98 where the critical parts constraint would be ~ 
56,125 vs. the 10,765 with the reliability reduction to 0.90. 
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Table 3. Availability shown highlighted as a function of system reliability and mean time to repair 

in hours. 


Availability (A) = Mean Time Between Failure (MTBF) / (MTBF + Mean Time To Repair (MTTR)) 

A = MTBF / (MTBF+MTTR) or MTTR = MTBF(1-A/ A) t= | 1080 i Hours 

A family of curves can be created for A = 90% to 99.9% with Sys. Reliability (R) = 0.95 to 0.99996 
Then MTTR is calculated for @ each A value 


Availability (A) 


System 

90% 

98% 

99% 

99.50% 

99.90% 

99.98% 

99.996% 

Reliability 


0.9000 

1,138.95 

209.19 

103.54 

51.51 

10.26 

2.05 

0.410 

0.9800 

5,939.80 

1 ,090.98 

539.98 

268.63 

53.51 

10.69 

2.138 

0.9900 

11,939.90 

2,193.04 

1,085.45 

540.00 

107.57 

21.50 

4.299 

0.9940 

19,939.94 

3,662.44 

1,812.72 

901.81 

179.64 

35.90 

7.179 

0.9950 

23,939.95 

4,397.13 

2,176.36 

1,082.71 

215.68 

43.10 

8.619 

0.9960 

29,939.96 

5,499.18 

2,721.81 

1,354.07 

269.73 

53.90 

10.779 

0.9980 

59,939.98 

11,009.38 

5,449.09 

2,710.85 

540.00 

107.91 

21.579 

0.9990 

119,939.99 

22,029.79 

10,903.64 

5,424.42 

1,080.54 

215.94 

43.180 

0.9995 

239,939.99 

44,070.61 

21,812.73 

10,851.56 

2,161.62 

431.98 

86.382 

0.9998 

599,940.00 

110,193.06 

54,540.00 

27,132.96 

5,404.86 

1,080.11 

215.987 

0.9999 

1,199,940.00 

220,397.14 

109,085.45 

54,268.64 

10,810.27 

2,160.32 

431.996 


MTTR (Hours) 


MTBF = 
-t/ln R 

10250.52 

53458.18 

107459.10 

179459.46 

215459.55 

269459.64 

539459.82 

1079459.91 

2159459.95 

5399459.98 

10799459.99 


Again it can be seen from Table 4 that it may be desirable to increase the systems reliability if it is 
unreasonable to constrain the parts count below ~56,125 with a probability of success greater than ~ 98%. 
If we select a systems reliability greater than 0.98 to accommodate an increased parts count constraint, we 
will again need to reassess the availability requirement value for 99.9% to retain the MTTR requirement to 
~ 216 hours. Attention should be paid to the element (part) failure rate requirement to attain these system 
reliability value to assure they are obtainable. 

For the purpose of understanding the impact on LCC, if it is assumed the unplanned work of replacing 
failed hardware required the MTTR to perform corrections was outside the 45 days (1080 hours), this 20% 
impact on the flow time would require an additional 20% program labor cost per flight operation. Another 
way of assessing the impact would be to equate the productivity of the transportation system as equivalent 
to 8.1 flights per year without impact of failures vs. the 20% impact which is equated to be 6.759 flights per 
year. However, if the 98% reliability and 98% availability are retained with an accepted MTTR of 1090.98 
hours, this impact would be equivalent to only 4.035 flights per year which would double the program labor 
cost per flight because it would take twice as long to perform the same total number of flights. Let us not 
forget that there is a constraint on total number of critical parts which is influenced by the selection of the 
required reliability value. To provide relief of the constraint of total critical parts allowed, it may be 
desirable to keep the higher reliability value and simply require the increase in availability value from the 
98% to the 99.9%. This assessment is based on allowing up to 2 critical parts to fail and be corrected within 
the planned time of 45 days. The system design now must accommodate the corrective action of these 2 
failures within the required MTTR value. 
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Table 4. System Complexity (parts count) constraint example shown as a function of system 
reliability (0.90 & 0.98) and 98% probability of success of controlling failures to 2 or less / event. 


IF: Proposed System Has Serial Element Count (N) = 2,000 

Mission Time (t) = 1,080 


Mission’s Maximum Failure Count (r) = 2 


And: 

Then: 

Probability Of Success: Failure Count Is r Or Less During t For Various 
System Complexity Levels (N ref ) Based On X* 

System 

Reliability 

(R) 

System 

MTBF 

Element 
Failure Rate 

0 i) 

1,000 

1,500 

2,000 

2,500 

5,000 

10,765 

56,125 

0.900 

10,250.5 

4.8778E-08 

0.99998 

0.99992 

0.99982 

0.99965 

0.99750 

0.98001 

0.43297 

0.945 

19,091.3 

2.6190E-08 

1.00000 

0.99999 

0.99997 

0.99994 

0.99958 

0.99625 

0.78658 

0.950 

21,055.4 

2.3747E-08 

1.00000 

0.99999 

0.99998 

0.99996 

0.99968 

0.99714 

0.82389 

0.955 

23,455.9 

2.1317E-08 

1.00000 

0.99999 

0.99998 

0.99997 

0.99977 

0.99789 

0.85893 

0.960 

26,456.3 

1.8899E-08 

1.00000 

1.00000 

0.99999 

0.99998 

0.99984 

0.99850 

0.89107 

0.965 

30,313.9 

1.6494E-08 

1.00000 

1.00000 

0.99999 

0.99999 

0.99989 

0.99898 

0.91974 

0.970 

35,457.3 

1.4101E-08 

1.00000 

1.00000 

1.00000 

0.99999 

0.99993 

0.99935 

0.94438 

0.975 

42,657.7 

1.1721E-08 

1.00000 

1.00000 

1.00000 

0.99999 

0.99996 

0.99962 

0.96457 

0.980 

53,458.2 

9.353 IE-09 

1.00000 

1.00000 

1.00000 

1.00000 

0.99998 

0.99980 

0.98002 

0.985 

71,458.6 

6.997 IE-09 

1.00000 

1.00000 

1.00000 

1.00000 

0.99999 

0.99992 

0.99072 

0.990 

107,459.1 

4.6529E-09 

1.00000 

1.00000 

1.00000 

1.00000 

1.00000 

0.99997 

0.99697 


For the purposes of determining the availability of the system during a more critical time during the 
launch operation, provide an assessment of the last 16 hours (two work shifts) of the total 45 day flow time 
by adjusting the value for t in our model to 16 hours. For this evaluation select a system reliability of 0.98 
and the availability desired of 0.98% with an upper-bound MTTR value of 5 hour for hardware 
replacement. It can be seen from Table 6 that a reliability value of 0.98 must be selected to achieve a one or 
less failure prediction within the 16 hours while constraining the critical parts count to 21,250 with 
minimum of a 0.98% probability of success. From Table 5 it can be determined with a system reliability 
value of 0.98 (MTBF of ~ 792 hours) that the availability must be 99.5% to constrain the MTTR to within 
the desired 5 hours. The first selected availability value of 0.98% would have allowed the MTTR of - 16 
hours which is not compatible with our requirement. If it is desirable to increase the critical parts constraint 
above the 21,250, the system reliability requirement may need to be raised to 0.99 at an availability 
requirement of 99.9% to allow constraining the parts failure potential to one element (part) during this final 
16 hour with a maximum of ~ 5 hours for this repair. 
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Table 5. Availability shown highlighted as a function of system reliability and mean time to repair 

in hours. 


Availability (A) = Mean Time Between Failure (MTBF) / (MTBF + Mean Time To Repair (MTTR)) 

A = MTBF / (MTBF+MTTR) or MTTR = MTBF(1-A / A) t= | 16 | Hours 

A family of curves can be created for A = 90% to 99.9% with Sys. Reliability (R) = 0.95 to 0.99996 
Then MTTR is calculated for @ each A value 


Availability (A) 


System 

90% 

98% 

99% 

99.50% 

99.90% 

99.97% 

99.994% 

Reliability 


0.9500 

34.66 

6.37 

3.15 

1.57 

0.31 

0.09 

0.02 

0.9800 

88.00 

16.16 

8.00 

3.98 

0.79 

0.24 

0.05 

0.9900 

176.89 

32.49 

16.08 

8.00 

1.59 

0.48 

0.10 

0.9940 

295.41 

54.26 

26.86 

13.36 

2.66 

0.80 

0.16 

0.9950 

354.67 

65.14 

32.24 

16.04 

3.20 

0.96 

0.19 

0.9960 

443.55 

81.47 

40.32 

20.06 

4.00 

1.20 

0.24 

0.9980 

888.00 

163.10 

80.73 

40.16 

8.00 

2.40 

0.48 

0.9990 

1,776.89 

326.37 

161.54 

80.36 

16.01 

4.80 

0.96 

0.9998 

8,888.00 

1,632.49 

808.00 

401.97 

80.07 

24.00 

4.80 

0.99990 

17,776.89 

3,265.14 

1,616.08 

803.98 

160.15 

48.01 

9.60 


MTTR (Hours) 


MTBF = 
-t/ln R 

311.93 

791.97 

1591.99 
2658.66 

3191.99 

3991.99 

7992.00 

15992.00 

79992.00 

159992.00 


Table 6. System Complexity (parts count) example shown as a function of system reliability (0.98) 
and 98% probability of success of controlling failures to 1 or less per event in time (16 hours). 

IF : Proposed System Has Serial Element Count (N) = 2,000 

Mission Time (t) = 16 


Mission's Maximum Failure Count (r) = 1 


And: 

Then: 

Probability Of Success: Failure Count Is r Or Less During t For Various 
System Complexity Levels (N rcf ) Based On Xj 

System 

Reliability 

(R) 

System 

MTBF 

Element 
Failure Rate 
(Xi) 

1,000 

1,500 

2,000 

2,500 

5,000 

10,000 

21,250 

0.940 

258.6 

1.9336E-06 

0.99953 

0.99896 

0.99816 

0.99716 

0.98920 

0.96096 

0.85885 

0.945 

282.8 

1.7678E-06 

0.99961 

0.99913 

0.99846 

0.99761 

0.99089 

0.96680 

0.87775 

0.950 

311.9 

1.6029E-06 

0.99968 

0.99928 

0.99873 

0.99803 

0.99245 

0.97223 

0.89586 

0.955 

347.5 

1.4389E-06 

0.99974 

0.99942 

0.99897 

0.99841 

0.99386 

0.97724 

0.91305 

0.960 

391.9 

1.2757E-06 

0.99979 

0.99954 

0.99919 

0.99874 

0.99513 

0.98180 

0.92918 

0.965 

449.1 

1,1 133E-06 

0.99984 

0.99965 

0.99938 

0.99904 

0.99626 

0.98590 

0.94411 

0.970 

525.3 

9.5185E-07 

0.99989 

0.99974 

0.99955 

0.99929 

0.99724 

0.98952 

0.95767 

0.975 

632.0 

7.91 18E-07 

0.99992 

0.99982 

0.99968 

0.99951 

0.99808 

0.99263 

0.96970 

0.980 

792.0 

6.3133E-07 

0.99995 

0.99989 

0.99980 

0.99969 

0.99877 

0.99523 

0.98001 

0.985 

1,058.6 

4.7230E-07 

0.99997 

0.99994 

0.99989 

0.99982 

0.99930 

0.99728 

0.98841 

0.990 

1,592.0 

3.1407E-07 

0.99999 

0.99997 

0.99995 

0.99992 

0.99969 

0.99878 

0.99469 


Again it can be seen the same relationship to LCC; however, the impact may be greater because during 
this phase of the launch operations range safety and other institutional functions are engaged in the 
operation driving up the cost of the operation. It may be hard to place a value on this additional impact, 
but it will be large. 
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- V. LCC Impact From Availability Cost Drivers (Reliability and Maintainability) 

A Three-Flow Shuttle Orbiter Component Removal Analysis performed in 1995 provided much insight 
into the understanding of actual experience. To allow this analysis to be performed in a reasonable time, the 
task was confined to a data set to 3 Flows and the top 12 of 59 subsystems [represents about 80% of line 
replaceable unit (LRU) problem reports (PRs)]. 10% of the hardware failures were discovered in-flight (8% 
were not documented as “in-flight anomalies” or IF As). Approximately 54% of all the LRUs failures 
reviewed result from required turnaround servicing and normal support functions. Approximately 12% of 
the LRUs reviewed were processing-induced (accessibility would be considered as major cause). It was 
found that about 36% of the reviewed LRU removals would go undetected as they were found during test 
and inspection (significant compromise in safety and mission success). Given the safety/reliability 
consequences of this level of undetected component failures accumulating over a period of flights, 
substantiates the value added by pre-flight test and certification. To save time and money, in place repair 
can be accomplished to avoid a LRU replacement. A comparison of component removals vs. engineering 
actions to avoid hardware removals indicated there was LRU PRs = 40% (i.e., component removals) vs. 
material review boards (MRBs) = 60% (i.e., removal avoidance). The average LRU replacement is 
approximately 100 per turnaround operation. 

To provide the understanding of the impact to LCC, this unplanned troubleshooting and repair accounted 
for approximately 24% of the work content during the total turnaround operation between flights of the 
shuttle system. Some fluid systems this work content was up to 50%. The cost is measured in several ways 
as there is labor, hardware logistics, and lost productivity of each shuttle orbiter. This loss in transportation 
productivity drives the facility infrastructure needs, e.g., these facilities now support less transportation 
system turnaround operation per year which results in additional facilities to support the annual launch rate. 
These additional facilities require resources to provide maintenance, repair, and operation which add to the 
LCC of the space transportation system operation. 

VI. Balancing System Safety, Reliability, and Maintainability Requirements Study 

One design technique for increasing mission reliability emphasizes increased redundancy. However, it 
should be noted that this technique, i.e., increasing reliability through redundancy, often results in 
increasing the maintainability burden. 

A notional example that demonstrates how added redundancy, inserted in a design to increase reliability, 
may also increase the maintainability burden is shown in Figure 2. The example sited compares a notional 
triple redundant string (Case 1 left) with a dual redundant string (Case 2 right). The examples use a mission 
reliability goal of 0.999. Case 1 results in a 300:1 added burden on maintainability, whereas case 2 results 
in a 60: 1 added burden. The resulting outcome on recurring cost is different - the added parts count of Case 
1 increases cost compared to Case 2. A thorough appreciation of the coupling of maintainability and 
reliability must be held throughout the design phase, and design teams must strive for low parts counts in 
order to meet critical maintainability objectives. 

Individual design teams often seek to optimize their system, and system engineering staffs often force this 
onto them via the requirements definition process. Design teams must first and foremost meet their 
specifications (be compliant). Program management must look at the total vehicle, its operations, its future 
upgrade- paths as a whole system, and be disciplined enough to appreciate the necessity of coupling 
maintainability and reliability goals with performance goals to prevent independent subsystem level 
optimizations from collectively adversely affecting LCC. 

As an example, even though the Shuttle initial design considerations emphasized performance, 
maintainability goals were also set, but inadequate internal discipline was exercised and a high 
performance, but also high maintenance cost, flight system was the result. Recurring costs and maintenance 
costs must be reduced and an essential factor in that process is the introduction of fewer, but more highly 
reliable, components. The inevitable “growth in parts count” that results from “highly redundant” design 
approaches often leads to maintenance intensive designs. 

If the element or component reliability requirements to achieve the desired minimum maintainability 
burden become prohibitive, then the efforts must focus on other methods of reducing the parts count. The 
reduction in parts count may be accomplished through improved functional integration of systems, 
restructuring redundant combinations to a minimum, and using the highest element reliability possible. 
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The Coupling of Maintainability 
And Reliability (Redundancy - Case) 


Techniques for increasing mission reliability (increased redundancy) should be used only to 
achieve safety goals desired above and beyond that reliability required to produce the lowest life 
cycle cost avoiding added maintainability burden. 
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Simple Example Comparisons 


(Each with common reliability/safety requirements, but very different maintainability 

burdens and recurring cost outcomes) 

Case 1 : Reliability objective set = 0.999 with single string component two orders-of-magnitude more reliable than 
using three parallel components 

• 1:1000 = 3 of 1 :10 in parallel = 0.999 reliability objective (Reliability = M/10 3 = 0.999) 

• Result : 300:1 added burden on maintainability for triple redundancy vs. single higher reliability 
component 

Case 2 : Seeks to find a dual redundant solution and meet the Reliability objective set = 0.999 

• 1:1000 = 2 of 1 :32 in parallel = 0.999 reliability objective (Reliability = 1 -1/32 2 = 0.999) 

• Result : 60:1 added burden on maintainability for dual redundancy vs. single higher reliability component 


Figure 2. Coupling of Maintainability and Reliability: Sample Case Study. 

The “maintainability burden” is a major driver of recurring cost. Often a key element in the recurring 
cost burden is a direct function of the number of parts. Part counts increase when the reliability of the 
selected parts is not sufficient to meet safety needs and redundancy is instituted to achieve reliability goals. 
Part counts are also increased by the lack of functional systems integration. To achieve recurring cost 
objectives, highly reliable parts must be used in the design, which in turn leads to lower repair requirements 
that will drive down the maintainability requirement. Figures 3 and 4 show these effects. 


Notional System Cost Dependence on 
Reliability, Maintainability, and Tradeoffs 


Nonrecurring 
Cost ($) 

• DDT&E 

• System HW/SW 
Procurement 

• Integrated 
Logistics 
Support Startup 

• Facilities Startup 



Recurring 

Cost ($) 

• System Ops 

• Integrated 
Logistics 
Support 

• Preplanned 
Product 
improvement 


10 1 10 2 10 3 10 4 10 s 

System Elements Reliability (MTBF) 

Figure 3. Flight System Cost Dependence on Reliability, Maintainability. 
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Figure 4. Mission Reliability, Repairs over Flight Rate Period and MTBF. 

Thus it is imperative to conduct early technology development programs to achieve the availability of 
high reliability parts. 

Achieving this balance of design life requirements with safety and maintainability objectives requires a 
process such as that shown in Figure 5 for developing and balancing quantitative safety, reliability and 
maintainability requirements. Achieving the appropriate balance among these factors to achieve low 
recurring cost requires a thorough understanding of subsystem element reliability, subsystem element fail 
rate and the number of serial system elements. 

VII. Conclusion and Recommendations 

In order to reduce the LCC of a future space transportation system, a rigorous process of Design-to- 
LCC Management must be implemented early in the vehicle design program and refined during its 
duration. Life cycle cost, as a part of the design team’s technical performance metrics (TPM), must be 
managed from the start of the design definition phase and carried through the operational life of the 
vehicles. It has been shown that a lower LCC can be achieved through the correct allocation of vehicle 
availability cost drivers: maintainability and reliability. However, a careful balance of system safety, 
reliability and maintainability must be carefully quantified and traded in the design process so as to not 
adversely affect each other. Vehicle system engineers must look at the total vehicle, its operations, and its 
future upgrade paths as a whole system, to prevent independent subsystem level optimization from 
collectively adversely affecting the overall system LCC. 

The availability requirement cannot be worked independently from the influencing attributes of MTBF 
requirement and MTTR requirement as well as a constraint on total parts count of the system being 
designed. These requirements must be developed together and maintained throughout the design process 
with the understanding of all their relationships. If a design analysis capability such as the one discussed in 
this paper or using today’s reliability and maintainability tools is used in the design, development, and 
evaluation (DDT&E) phase, the availability requirement, the MTTR requirement, the MTBF requirement, 
probability of success, affordability, and safety can all be controlled by design. However, because of their 
relationships to each other, they must be worked and developed together to provide the correct 
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understanding and control to meet all of the objectives. They also must be performed during concept 
development and available as requirement inputs before proceeding with the detailed design. Traditionally 
these reliability and maintainability assessments and adjustments to the design are performed much too late 
in the process to provide an enabling benefit to an affordable space transportation system. We can’t over 
stress the need to have this activity involved in developing the requirements at the start of the 
concept/architecture development and throughout the DDT&E phase. 


Process for Developing and Balancing 
Quantitative S, R & M Requirements 



Figure 5. Process for Developing and Balancing Quantitative Safety, Reliability and Maintenance 

Requirements. 

Additional benefits can be achieved by selecting the best technologies that provide major reductions in 
total parts count. Example would be to select a direct electro-mechanical control instead of using an 
intermediate fluid to perform the function while using the electro-mechanical device to control the 
intermediate fluid, e.g., electro-mechanical valve controlling fluid flow vs. a hydraulic or pneumatic 
operated valve while using a solenoid valve to control the hydraulic or pneumatic fluid which then controls 
the fluid valve. The use of common fluids for propulsion applications allowing an integrated system 
solution with only one fluid container would provide a major reduction in total parts count. When the 
criticality drives the design to provide redundant hardware solutions, the selection of hardware should 
always be at the best element reliability possible to provide the lowest maintenance burden for lowering life 
cycle costs. In all of the above examples the resultant DDT&E and operational cost will be reduced along 
with the achievement of the highest overall system reliability and safety. This will also lead to a higher 
availability of the system, thus enabling mission success. 

In summary, added emphasis in any system development on the issue of inherent reliability, in so far as 
it addresses both parts count and MTBF, inevitably will improve performance, safety and operational 
affordability. Performance is improved when fewer, better, parts are used and they should weigh less 
overall. Weight margin at the start of the program should be large enough to accommodate common 
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hardware, in many cases of-the-shelf, from the aircraft industry to allow selection of demonstrated high 
reliability hardware. Safety will be improved as hardware that fails less during integration, checkout, and 
servicing inevitably will perform better in actual use. Affordability is helped because better performance 
makes each flight more productive or allows more flights given shorter process or production intervals. 
Ultimately hardware that cannot be counted on to function during processing, regardless of redundancies, 
cannot be expected to function well in flight. All that is lacking is the non-recurring investment up-front 
that focuses on sufficient generic technology. This initial investment is then justified when numerous 
subsequent users take advantage of it. An additional benefit is that the productivity of the facility 
infrastructures in improved which will require fewer facilities and again result in lower LCC. 
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